Improving Language Identification of Web Page Using Optimum Profile
نویسندگان
چکیده
Language is an indispensable tool for human communication, and presently, the language that dominates the Internet is English. Language identification is the process of determining a predetermined language automatically from a given content (e.g. The ability to identify other languages in relation to English is highly desirable. It is the goal of this research to improve the method used to achieve this end. Three methods have been studied in this research are distance measurement, Boolean method, and the proposed method, namely, optimum profile. From the initial experiments, we have found that, distance measurement and Boolean method is not reliable in the European web page identification. Therefore, we propose optimum profile which is using N-grams frequency and N-grams position to do web page language identification. The result show that the proposed method gives the highest performance with accuracy 91.52%.
منابع مشابه
Improving Query Translation for Cross-Language Information Retrieval using a Web-based Approach
With the increasing popularity of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing improving approaches for query translation such as noun phrase (NP) identification, translation and words translation selection require special corpus resource. However, those natural language resources are not readily available. In this paper, we propos...
متن کاملThe Web Mashup Scripting Language Profile
This paper provides an overview of the Web Mashup Scripting Language (WMSL) and discusses the WMSL-Profile. It specifies the HTML encoding that is used to import Web Service Description Language (WSDL) files and metadata, in the form of mapping relations, into a WMSL web page. Furthermore, the WMSL-Profile describes the conventions used to parse the WMSL pages. It is envisioned that these WMSL ...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملWeb page language identification based on URLs
Given only the URL of a web page, can we identify its language? This is the question that we examine in this paper. Such a language classifier is, for example, useful for crawlers of web search engines, which frequently try to satisfy certain language quotas. To determine the language of uncrawled web pages, they have to download the page, which might be wasteful, if the page is not in the desi...
متن کاملThe Impact of Computer–Assisted Language Learning (CALL) /Web-Based Instruction on Improving EFL Learners’ Pronunciation Ability
The purpose of this study was to investigate the effect of CALL/Web-based instruction on improving EFL learners’ pronunciation ability. To this end, 85 students who were enrolled in a language institute in Rasht were selected as subjects. These students were given the Oxford Placement Test in order to validate their proficiency levels. They were then divided into two groups of 30 and were...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011